Framing

Audience:

Style:

The Friends of the Earth - “England’s Green Space Gap”

Friends of the Earth (FoE) have recently released a report focused on “England’s Green Space Gap.” The headline finding of the report is that one in five people in England live in areas where it is difficult to access green space . The report also provides a holistic overview of why green space is so important, by highlighting how individuals and communities benefit from having access to both public and private green space. These benefits which stretch far beyond the natural environmental itself, and encompass a myriad of social, health and economic benefits.

As part of the research underpinning the Green Space Gap report, Friends of the Earth have developed a new approach for classifying the extent to which neighborhoods (or Middle Super Output Areas in the terminology of the administrative geography) across England experience green space deprivation. Neighborhoods are classified into five groups; with group A including the least green space deprived neighborhoods, and E including the most green space deprived.

Friends of Earth have released the dataset that they developed and used to classify green space deprivation within the Green Space Gap report. In this notebook I plan to conduct an exploratory data analysis using this Friends of the Earth dataset. Before doing so, I think it might be helpful to outline the way in which Friends of the Earth processed the dataset. This is outlined in the figure below and incorporated the following steps:

Producing the Friends of the Earth Green Space Deprivation ratings.

Full details of the methodology used by Friends of the Earth can be found on page 36 of the Green Space Gap report.

n.b. In the report, Friends of the Earth draw on the Index of Multiple Deprivation (IMD) dataset to explore the relationship between the green space deprivation ratings and demographic factors including ethnicity and income.

The Structure of this Exploratory Data Analysis

Reading the Green Space Gap report and exploring the associated dataset, I was struck by a number of questions about the nature and scope of green space deprivation in England. I thought that these questions might be a good basis for my exploratory data analysis.

Below I address each of these questions in turn with the aim of extending upon the analysis of the data presented in the report. By doing so, I hope to contribute to the wider debate on maintaining and extending access to green space during the post-covid recovery.

Datasets Used

Ahead of moving on to the exploratory data analysis itself, I thought it would be helpful to very briefly document the datasets I used. This includes the Friends of the Earth dataset, and additional datasets from ONS which proved interesting or helpful in the context of my exploratory data analysis. In particular, I thought it was recording the versions of the dataset used where multiple version are available from the ONS

variable_name file_name notes url
green_space (FOE) Green Space Consolidated Data - England - Version 2.1.xlsx https://friendsoftheearth.uk/nature/green-space-consolidated-data-england
LAD_to_region Local_Authority_District_to_Region__December_2019__Lookup_in_England.csv used the December 2019 version https://geoportal.statistics.gov.uk/datasets/3ba3daf9278f47daba0f561889c3521a_0
urban_rural_classification RUC_MSOA_2001_EW_LU.csv 2001 was the latest version available https://geoportal.statistics.gov.uk/datasets/rural-urban-classification-2001-of-msoas-in-england-and-wales

Ahead of conducting the exploratory data analysis I imported the three datasets and then merged into the single dataframe shown below. I have retain all the variables from the Friends of the Earth dataset in this dataframe, including non-green space variables from the Indices of Multiple Deprivation, in case they prove useful later in the analysis.

What is the scale of the green space deprivation problem in England?

This exploratory data analysis begins by focusing on the green space deprivation ratings of each English neighborhood. In this section of the analysis, I do not drill down into the underlying data that informs the ratings. More detailed exploration of the underlying data is picked up in the later sections of this notebook. But initially I wanted to get a better understanding of the green space deprivation ratings themselves, and their potential implications.

The first question I turn to is how many neighborhoods are considered green space deprived in the Friends of the Earth Analysis? The plot below shows the numbers of neighborhoods classified in each category. Reviewing the plot I noted that:

There are a small number of categories that a rating can fall into (A-E), and the ratings are based on simplified and abstracted representations of the ONS green space data (i.e. the green space scores shown as described in the introduction section above ). So, I do not think there is much value at this stage in considering descriptive statistics which summarise the distribution of green space deprivation ratings. At some point in might worth considering how the simplifications/abstractions used have affect the distributions of the ONS green space data, but I leave this to one side for now.

I am interested in how understanding in more detail the numbers and proportions of both neighborhoods and the population which are impacted by green space deprivation. To this end I produced the table below. The table separates the ratings in to two groups:

Reviewing the table I noted:

Green Space Deprivation in England
Understanding the scale of the problem
Green Space
Deprivation
Rating
Neighbourhoods Population
Number % Millions %
Urgent action needed to improve access to green space
E 1108 16 9.62 18
D 955 14 8.21 15
Total 2,063 30 17.84 33
Action needed to protect access to green space
C 1727 25 13.54 25
B 1360 20 10.77 20
A 1641 24 12.58 23
Total 4,728 70 36.89 67
Source: Friends of the Earth

How is green space deprivation distributed across regions in England?

Having explored how many neighborhoods and people are affected by green space deprivation across England as a whole, I was interested to understand if communities and people in some regions are more affected than others. In turn this could provide an indication of where action to alleviate green space deprivation is most needed. The plot below shows for each region the numbers the numbers of neighborhoods receiving each green space deprivation rating. Reviewing the plot I noted:

I was interest to understand in little more detail where the neighborhoods receiving the highest green space deprivation ratings (D and E). Below I plotted how the proportions of neighborhoods receiving a given rating (in this case D or E) are distributed across the English regions. Doing this involved addressing the challenge of how to ensure the colour associated with a given region was applied consistently across the two plots. This blog on How to map a colour to a value of a categorical variable … was very helpful in addressing this challenge.

Reviewing the plot below I noted that:

Is green space deprivation an urban problem?

Having explored how green space deprivation is distributed across the English regions, I was interested to dig a little deeper into the question of where (in general rather than geographic sense) green space deprivation is a problem. In particular, it seems to intuitively make sense that green space deprivation is primarily an urban problem. I wanted to see if this intuition is born out by the data.

This involved finding a dataset which classified MSOAs (i.e. neighborhoods) by whether or not they can be considered urban. Find the appropriate ONS dataset took a little time and effort, but in the end I found an urban-rural classification conducted in 2001. Obviously it is not ideal to use a twenty year old data, when over that period it is likely that some rural MSOAs on the edges of urban areas will have become more developed. There was a more recent urban-classification conducted but the results do not appear to have been released as open data (the result are displayed on a web-GIS).

The table below shows the breakdown of neighborhoods by both green space deprivation rating and the type of neighborhood as defined in the ONS dataset. With neighborhoods being classified into one of three categories: (1) urban > 10k; (2) town and fringe; and, Village Hamlet & Isolated Dwellings. I also included an additional category for neighborhoods where it was not possible to identify an urban-rural classification (see column NA_). The percentages in the table sum to 100% column-wise. That is to say that the percentages show how the neighborhoods with each urban-rural classification breakdown over the five green space deprivation ratings.

Reviewing the table I noted that:

green_space_deprivation_rating Town and Fringe Urban > 10K Village Hamlet & Isolated Dwellings NA_
A 51% (309) 14% (747) 84% (566) 13% (19)
B 7% (41) 24% (1287) 0% (0) 21% (32)
C 41% (248) 25% (1345) 16% (110) 16% (24)
D 1% (6) 17% (923) 0% (0) 17% (26)
E 0% (2) 20% (1056) 0% (0) 33% (50)
Total 100% (606) 100% (5358) 100% (676) 100% (151)

Reviewing the visual representation of the relationship - between green space deprivation rating and urban-rural classification of each neighborhood - below highlighted the following points.

What can the dataset tell us about what green space deprivation looks like in England?

Having focused on the green space deprivation rating themselves so far, I was interested to understand more about the data that informed these ratings. The ratings are calculated using three green space scores (each ranging from 1 to 4), see page 36 of report for more details. In turn each of these scores was calculated based on (what I have called) summary variables:

  1. The garden area per capita (m2 per person) within the neighborhood - hopefully fairly self explanatory and referred to as garden_area_per_capita in the dataset;
  2. The public green space area per capita (m2 per person) - again hopefully fairly self explanatory (setting aside issues around the definition of public green space and how it might be identified) and referred to asgreen_space_area_per_capita in the dataset;
  3. The proportion of population within the neighborhood within 5 minutes walks of public green space - This is the most complex summary variable and is referred to as pcnt_pop_with_go_space_access in the dataset. For this variable FoE considered only public green spaces of two or more hectacres in size. I am unsure how FoE calculated values for this variable, I assume some form of GIS analysis was involved. The definition of this variable is based on a Natural England standard. In turn this standard is based on research indicating that people access green space within five minutes walk considerable more frequently that green space beyond five minutes walk.

In this section of the notebook I explore the distributions of, and correlations between, these three summary variables. Throughout this exploration I am seeking to better understand what green space deprivation looks like in England, and how this help me to better understand the FoE green space ratings. Which, in turn, I hope will inform my thinking about how to use statistical methods (e.g. k-means or k-medians clustering) to identify clusters of neighborhoods with similar green space characteristics. In this section I do however overlook for now the green space score variables, which FoE calculated from the summary variables. As each of the scores is essential a simplified version of one of the three summary variables.

Before looking at each of the summary variables in more detail I take a very quick look at some descriptive statistics. Reviewing the table below, I noted that:

Distributions of the summary variables

Garden area per capita

The first summary variable I looked at in detail is average garden area per capita. The histogram below shows the numbers of neighborhoods with different average garden area per capita. This gives a sense of how much garden space (per capita) English neighborhoods have. The frequency distribution is heavily right skewed. With most neighborhoods have average garden area per capita of less than 200 m2, and then a long tail consisting of relatively few neighborhoods up to 800m2. Note that the x axis of the plot does not display a number of neighborhoods in the dataset (see the caption for details).

Consider the relative distribution of garden space across neighborhoods in England is relatively straight forward. It is more challenging to understand what the implications of this distribution are. Develop such understanding would require comparison with some benchmark values. My initial searches suggest that there may be limited research on how much garden space per capita is a “good” amount in terms of creating benefits whether for health and well-being (e.g relaxation and stress reduction) or the environment (e.g. through improved drainage and reduced flood risk).

In my admittedly fairly cursory searches to date I did find garden space guidelines within the Essex Design Guide; an evolving set of guidelines that has shaped development in the English county of Essex for over 45 years. The Essex Design guide recommends minimum garden sizes of 100m2 per household. A figure which is said to allow for a wide range household activities and the sunlight needed to grow plants. For smaller houses where space is at a premium the guide recommends 50m2 per household. Sticking with the higher of the two figures as a benchmark, and making a crude assumption that on average is occupied by 2 people, 50m2 garden space per capita might be considered a minimum standard. Here the conversion between per household figures and per capita figures is needed because the FoE dataset provides only the later.

The FoE green space scores shown on the background of the histogram correspond to the quartiles of the garden_area_per_capita variable. With a score 1 indicating neighborhoods in the first quartile (i.e. bottom 25% for garden space per capita), and a score of 4 indicating neighborhoods in the fourth quartile (i.e. top 25%). So, from the histogram below we can see that more than 75% of English neighborhoods have more than 50m2 garden space per capita. Which in turn suggests that the most of the English population may not be affected by severe green space deprivation due to lack of garden space.

* An idea of sorts - trying to show something about how much of overall garden space is accessible to how many people

## Warning in self$trans$transform(x): NaNs produced
## Warning: Removed 1 rows containing missing values (geom_rect).

Reviewing the garden area plots above also lead me to consider following points (although I didn’t have the opportunity explore them in detail).

  • The figures within the FoE dataset identify the average (presumably mean, although I need to check this) amount of garden area per capita for neighborhoods in England. With the data to hand it is not possible to understand the distribution of green space within neighborhoods. Hence, it is possible the use of average garden area per capita data at the neighborhood is obscuring the skewed distributions of garden area within neighborhoods. This issue seems more likely to be occurring where neighborhoods have different types of housing stock within them, which in turn are likely to have different amounts of garden space. For example, a neighborhood could have a housing stock split between flats (with little or no gardens) and houses with large gardens. It might be possible to explore this issue a further using the ONS green space data (included within the FoE dataset) as this does distinguish between flats and houses within each neighborhood.
  • There is also the question of how the value (in it’s broadest sense encompassing the types of benefits discussed above) of additional garden space. So, it seems reasonable to assume that if an individual moves from having 0 and 25m2 of garden space the well-being benefits would be significant. However, it is less clear that if an individual moving from say having 200 to 225 m2 of garden space that there would be significant benefits.
  • There is an implicit assumption in this analysis that garden space is green space. However, some garden are likely to be predominately gray (e.g paved) and some are likely to be not well managed (e.g. overgrown). So, it might be interesting in future work to consider how the quality of garden space might be modeled, and how this might inform additional analysis of the garden area data.

Public green space area per capita

The proportion of population within the neighborhood within 5 minutes walks of public green space

Correlations between summary variables

Understanding the role of the underlying variables

Summary / Conclusions

Further work

The distribution of green space deprivation ratings across England

Green space area per capita

dealing with outliers Boxplot shows outliers at 1.5*IQR + Q3 - they are part of the natural variability of the population, so it seems appropriate to retain the outliers, but zoom on the graphs because the .

not sure on whether or not to filter out outliers

So, I wondered if the outliers/very long tail are a result of areas with small populations and/very large areas of green space.

So, it looks like the it is the green space area has much more influence on green space area per capita, than population.

So, lets look at the distribution of green_space_area itself. This is relatively tricky given the wide range of values (as shown in the summary stats). I tried histograms and density plots too, but a box plot seemed the best way to understand the distribution. The first boxplot shows the full distribution and as a result is very difficult to interpret as the large outliers to the right of plot result in the box itself appearing as a single line and hence being very difficult to interpret. In the second plot hte x axis is cropped so it is straight forward to interpret the box component of the plot. However, this comes at the cost of failing to show the very large outliers within the distribution.

The extreme skew of the distribution can be seen in the summary statistics below. The median for green_space_area is 152,418 m2 while the maximum 636,087,671 m2.

A similarly extreme right skewed distribution can seen for green_space_area_per_capita, as shown in the plots and summary stats below. It is worth noting just how atypical many of the larger outliers are. The median for green_space_area_per_capita is less than 20 m2 per capita, while the maximum is approximately 100,000 m2 per capita.

So, I thought it was worth a quick look at the population density across English MSOAs. The first graph shows the kernel density function for the population density of English MSOAs. Key features of the distribution include:

The second plot groups MSOAs by their FoE green space deprivation rating and highlights:

Plotting population density against green space area and green space area per capita produces very associations. Note the log scales on both the x and y axis in both cases.

Green space access

Garden space

Urban-rural classification and green space deprivation ratings

Rural-urban classification at LA scale

https://www.gov.uk/government/statistics/local-authority-rural-urban-classification Rural-Urban Classification of Local Authorities Post-2009 Boundaries

Rural-urban classication at MSOA scale

https://geoportal.statistics.gov.uk/datasets/rural-urban-classification-2001-of-msoas-in-england-and-wales urban_rural_classification

Some thoughts on where I am in understanding the FoE ratings and green_space_area:

An alternative appraoch - clustering

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Standardization-in-Cluster-Analysis/ta-p/302296

My initial efforts in transforming the data - a log transformation and then scaling values to the unit interval (i.e. 0…1) - proved rather unsuccessful. See the summary stats below, with the transformed values remaining tightly grouped together around the median.

So, I wondered about focusing on a subset of the data which could be easier to work with. Perhaps given the focus on green space deprivation it makes sense to remove those clearly green space affluent MSOAs (e.g. those with 10,000’s m2 public green space per capita)

The distribution of green space deprivation ratings across the regions

The geographic distribution of green space deprivation ratings

https://datacarpentry.org/r-raster-vector-geospatial/06-vector-open-shapefile-in-r/

A quick visual inspection of the MSOAs colored by their green space deprivation rating, shows a similiar patter across the regions (with the exception of London). With the the D and E ratings (oranges and reds) occurring in smaller (presumably more densely populated MSOAs) which make up urban areas. While the larger, more rural MSOAs tend to be less green space deprived, and have A or B ratings. Given the whole region of London would probably be considered a continuous urban space, it is unsurprising to observe many green space deprived MSOAs across the region/plot, with relatively few less green space deprived areas present.

Ethnicity and green space deprivation

Wealth and green space deprivation

Health and green space deprivation

Covid and green space deprivation

Archived code

Ideas:

Russo, Alessio, and Giuseppe Cirella. 2018. “Modern Compact Cities: How Much Greenery Do We Need?” International Journal of Environmental Research and Public Health 15 (10): 2180. https://doi.org/10.3390/ijerph15102180.